Overview

Dataset statistics

Number of variables15
Number of observations99003
Missing cells177
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory11.3 MiB
Average record size in memory120.0 B

Variable types

NUM14
CAT1

Warnings

dob_year is highly correlated with ageHigh correlation
age is highly correlated with dob_yearHigh correlation
mobile_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly correlated with mobile_likes_received and 1 other fieldsHigh correlation
www_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly skewed (γ1 = 112.0745682) Skewed
mobile_likes_received is highly skewed (γ1 = 107.5312999) Skewed
www_likes_received is highly skewed (γ1 = 126.257317) Skewed
userid has unique values Unique
friend_count has 1962 (2.0%) zeros Zeros
friendships_initiated has 2997 (3.0%) zeros Zeros
likes has 22308 (22.5%) zeros Zeros
likes_received has 24428 (24.7%) zeros Zeros
mobile_likes has 35056 (35.4%) zeros Zeros
mobile_likes_received has 30003 (30.3%) zeros Zeros
www_likes has 60999 (61.6%) zeros Zeros
www_likes_received has 36864 (37.2%) zeros Zeros

Reproduction

Analysis started2021-01-24 14:04:08.743751
Analysis finished2021-01-24 14:05:13.314395
Duration1 minute and 4.57 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

userid
Real number (ℝ≥0)

UNIQUE

Distinct99003
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1597045.208
Minimum1000008
Maximum2193542
Zeros0
Zeros (%)0.0%
Memory size773.5 KiB

Quantile statistics

Minimum1000008
5-th percentile1060618.3
Q11298805.5
median1596148
Q31895744
95-th percentile2133357.1
Maximum2193542
Range1193534
Interquartile range (IQR)596938.5

Descriptive statistics

Standard deviation344059.1775
Coefficient of variation (CV)0.2154348391
Kurtosis-1.199556831
Mean1597045.208
Median Absolute Deviation (MAD)298438
Skewness0.0001076605667
Sum1.581122667e+11
Variance1.183767176e+11
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
11592241< 0.1%
 
11292021< 0.1%
 
10555101< 0.1%
 
18552271< 0.1%
 
21103691< 0.1%
 
19914491< 0.1%
 
21286661< 0.1%
 
18843351< 0.1%
 
20821231< 0.1%
 
10268481< 0.1%
 
Other values (98993)98993> 99.9%
 
ValueCountFrequency (%) 
10000081< 0.1%
 
10000131< 0.1%
 
10000151< 0.1%
 
10000381< 0.1%
 
10000591< 0.1%
 
ValueCountFrequency (%) 
21935421< 0.1%
 
21935381< 0.1%
 
21935221< 0.1%
 
21934991< 0.1%
 
21934851< 0.1%
 

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.28022383
Minimum13
Maximum113
Zeros0
Zeros (%)0.0%
Memory size773.5 KiB

Quantile statistics

Minimum13
5-th percentile15
Q120
median28
Q350
95-th percentile90
Maximum113
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.58974831
Coefficient of variation (CV)0.6059445462
Kurtosis1.561446767
Mean37.28022383
Median Absolute Deviation (MAD)10
Skewness1.415260654
Sum3690854
Variance510.2967289
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1851965.2%
 
2344044.4%
 
1943914.4%
 
2037693.8%
 
2136713.7%
 
2536413.7%
 
1732833.3%
 
1630863.1%
 
2230323.1%
 
2428272.9%
 
Other values (91)6170362.3%
 
ValueCountFrequency (%) 
134840.5%
 
1419251.9%
 
1526182.6%
 
1630863.1%
 
1732833.3%
 
ValueCountFrequency (%) 
1132020.2%
 
11218< 0.1%
 
11118< 0.1%
 
11015< 0.1%
 
1099< 0.1%
 

dob_day
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.53040817
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size773.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q17
median14
Q322
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation9.015606359
Coefficient of variation (CV)0.6204647697
Kurtosis-1.188960111
Mean14.53040817
Median Absolute Deviation (MAD)8
Skewness0.1078407568
Sum1438554
Variance81.28115802
MonotocityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%) 
179008.0%
 
1040304.1%
 
1535553.6%
 
535453.6%
 
1234133.4%
 
234093.4%
 
332913.3%
 
1732663.3%
 
2032633.3%
 
1432193.3%
 
Other values (21)6011260.7%
 
ValueCountFrequency (%) 
179008.0%
 
234093.4%
 
332913.3%
 
432173.2%
 
535453.6%
 
ValueCountFrequency (%) 
3115071.5%
 
3025302.6%
 
2925082.5%
 
2829553.0%
 
2727552.8%
 

dob_year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1975.719776
Minimum1900
Maximum2000
Zeros0
Zeros (%)0.0%
Memory size773.5 KiB

Quantile statistics

Minimum1900
5-th percentile1923
Q11963
median1985
Q31993
95-th percentile1998
Maximum2000
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.58974831
Coefficient of variation (CV)0.01143368032
Kurtosis1.561446767
Mean1975.719776
Median Absolute Deviation (MAD)10
Skewness-1.415260654
Sum195602185
Variance510.2967289
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
199551965.2%
 
199044044.4%
 
199443914.4%
 
199337693.8%
 
199236713.7%
 
198836413.7%
 
199632833.3%
 
199730863.1%
 
199130323.1%
 
198928272.9%
 
Other values (91)6170362.3%
 
ValueCountFrequency (%) 
19002020.2%
 
190118< 0.1%
 
190218< 0.1%
 
190315< 0.1%
 
19049< 0.1%
 
ValueCountFrequency (%) 
20004840.5%
 
199919251.9%
 
199826182.6%
 
199730863.1%
 
199632833.3%
 

dob_month
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.283365151
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size773.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.529671569
Coefficient of variation (CV)0.5617485987
Kurtosis-1.240397572
Mean6.283365151
Median Absolute Deviation (MAD)3
Skewness0.03129550742
Sum622072
Variance12.45858138
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
11177211.9%
 
1084768.6%
 
582718.4%
 
882668.3%
 
381108.2%
 
780218.1%
 
979398.0%
 
1278948.0%
 
478107.9%
 
276327.7%
 
Other values (2)1481215.0%
 
ValueCountFrequency (%) 
11177211.9%
 
276327.7%
 
381108.2%
 
478107.9%
 
582718.4%
 
ValueCountFrequency (%) 
1278948.0%
 
1172057.3%
 
1084768.6%
 
979398.0%
 
882668.3%
 

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing175
Missing (%)0.2%
Memory size773.5 KiB
male
58574 
female
40254 
ValueCountFrequency (%) 
male5857459.2%
 
female4025440.7%
 
(Missing)1750.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.811419856
Min length3

tenure
Real number (ℝ≥0)

Distinct2426
Distinct (%)2.5%
Missing2
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean537.8873749
Minimum0
Maximum3139
Zeros70
Zeros (%)0.1%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile47
Q1226
median412
Q3675
95-th percentile1575
Maximum3139
Range3139
Interquartile range (IQR)449

Descriptive statistics

Standard deviation457.6498739
Coefficient of variation (CV)0.8508284359
Kurtosis2.199058275
Mean537.8873749
Median Absolute Deviation (MAD)213
Skewness1.535680925
Sum53251388
Variance209443.4071
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3001730.2%
 
3031700.2%
 
2421640.2%
 
2721630.2%
 
2571610.2%
 
2971610.2%
 
2851600.2%
 
2801600.2%
 
2841580.2%
 
2781580.2%
 
Other values (2416)9737398.4%
 
ValueCountFrequency (%) 
0700.1%
 
1600.1%
 
2720.1%
 
3790.1%
 
4860.1%
 
ValueCountFrequency (%) 
31393< 0.1%
 
31291< 0.1%
 
31281< 0.1%
 
31011< 0.1%
 
30191< 0.1%
 

friend_count
Real number (ℝ≥0)

ZEROS

Distinct2562
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean196.3507873
Minimum0
Maximum4923
Zeros1962
Zeros (%)2.0%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile3
Q131
median82
Q3206
95-th percentile720
Maximum4923
Range4923
Interquartile range (IQR)175

Descriptive statistics

Standard deviation387.304229
Coefficient of variation (CV)1.972511719
Kurtosis50.09427289
Mean196.3507873
Median Absolute Deviation (MAD)64
Skewness6.059008484
Sum19439317
Variance150004.5658
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
019622.0%
 
118161.8%
 
211171.1%
 
38600.9%
 
57890.8%
 
47490.8%
 
107370.7%
 
247320.7%
 
67200.7%
 
297190.7%
 
Other values (2552)8880289.7%
 
ValueCountFrequency (%) 
019622.0%
 
118161.8%
 
211171.1%
 
38600.9%
 
47490.8%
 
ValueCountFrequency (%) 
49231< 0.1%
 
49171< 0.1%
 
48631< 0.1%
 
48451< 0.1%
 
48441< 0.1%
 

friendships_initiated
Real number (ℝ≥0)

ZEROS

Distinct1519
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.4524711
Minimum0
Maximum4144
Zeros2997
Zeros (%)3.0%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile1
Q117
median46
Q3117
95-th percentile418
Maximum4144
Range4144
Interquartile range (IQR)100

Descriptive statistics

Standard deviation188.786951
Coefficient of variation (CV)1.756934475
Kurtosis42.53560096
Mean107.4524711
Median Absolute Deviation (MAD)36
Skewness5.150757415
Sum10638117
Variance35640.51287
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
029973.0%
 
122122.2%
 
215511.6%
 
313551.4%
 
413521.4%
 
613281.3%
 
513281.3%
 
1113191.3%
 
813141.3%
 
1312791.3%
 
Other values (1509)8296883.8%
 
ValueCountFrequency (%) 
029973.0%
 
122122.2%
 
215511.6%
 
313551.4%
 
413521.4%
 
ValueCountFrequency (%) 
41441< 0.1%
 
36541< 0.1%
 
35941< 0.1%
 
35381< 0.1%
 
34151< 0.1%
 

likes
Real number (ℝ≥0)

ZEROS

Distinct2924
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.0787855
Minimum0
Maximum25111
Zeros22308
Zeros (%)22.5%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median11
Q381
95-th percentile726
Maximum25111
Range25111
Interquartile range (IQR)80

Descriptive statistics

Standard deviation572.2806808
Coefficient of variation (CV)3.666614134
Kurtosis200.4456878
Mean156.0787855
Median Absolute Deviation (MAD)11
Skewness11.02370356
Sum15452268
Variance327505.1777
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02230822.5%
 
169287.0%
 
244344.5%
 
332403.3%
 
425072.5%
 
520272.0%
 
618061.8%
 
716181.6%
 
814301.4%
 
913811.4%
 
Other values (2914)5132451.8%
 
ValueCountFrequency (%) 
02230822.5%
 
169287.0%
 
244344.5%
 
332403.3%
 
425072.5%
 
ValueCountFrequency (%) 
251111< 0.1%
 
216521< 0.1%
 
167321< 0.1%
 
165831< 0.1%
 
147991< 0.1%
 

likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2681
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.6893629
Minimum0
Maximum261197
Zeros24428
Zeros (%)24.7%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q359
95-th percentile561
Maximum261197
Range261197
Interquartile range (IQR)58

Descriptive statistics

Standard deviation1387.919613
Coefficient of variation (CV)9.726861091
Kurtosis17384.94
Mean142.6893629
Median Absolute Deviation (MAD)8
Skewness112.0745682
Sum14126675
Variance1926320.851
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02442824.7%
 
173057.4%
 
245414.6%
 
333473.4%
 
426692.7%
 
523732.4%
 
618731.9%
 
716801.7%
 
815381.6%
 
913511.4%
 
Other values (2671)4789848.4%
 
ValueCountFrequency (%) 
02442824.7%
 
173057.4%
 
245414.6%
 
333473.4%
 
426692.7%
 
ValueCountFrequency (%) 
2611971< 0.1%
 
1781661< 0.1%
 
1520141< 0.1%
 
1060251< 0.1%
 
826231< 0.1%
 

mobile_likes
Real number (ℝ≥0)

ZEROS

Distinct2396
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.1162995
Minimum0
Maximum25111
Zeros35056
Zeros (%)35.4%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q346
95-th percentile481.9
Maximum25111
Range25111
Interquartile range (IQR)46

Descriptive statistics

Standard deviation445.2529851
Coefficient of variation (CV)4.195896268
Kurtosis360.9885806
Mean106.1162995
Median Absolute Deviation (MAD)4
Skewness14.16123656
Sum10505832
Variance198250.2207
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03505635.4%
 
162976.4%
 
239414.0%
 
329172.9%
 
422652.3%
 
517941.8%
 
615981.6%
 
713951.4%
 
812121.2%
 
911491.2%
 
Other values (2386)4137941.8%
 
ValueCountFrequency (%) 
03505635.4%
 
162976.4%
 
239414.0%
 
329172.9%
 
422652.3%
 
ValueCountFrequency (%) 
251111< 0.1%
 
216521< 0.1%
 
167321< 0.1%
 
140391< 0.1%
 
135291< 0.1%
 

mobile_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2004
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.1204913
Minimum0
Maximum138561
Zeros30003
Zeros (%)30.3%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q333
95-th percentile317
Maximum138561
Range138561
Interquartile range (IQR)33

Descriptive statistics

Standard deviation839.8894437
Coefficient of variation (CV)9.984362083
Kurtosis15522.64932
Mean84.1204913
Median Absolute Deviation (MAD)4
Skewness107.5312999
Sum8328181
Variance705414.2777
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03000330.3%
 
182438.3%
 
249485.0%
 
336083.6%
 
429443.0%
 
523832.4%
 
620222.0%
 
717451.8%
 
815211.5%
 
914371.5%
 
Other values (1994)4014940.6%
 
ValueCountFrequency (%) 
03000330.3%
 
182438.3%
 
249485.0%
 
336083.6%
 
429443.0%
 
ValueCountFrequency (%) 
1385611< 0.1%
 
1312441< 0.1%
 
899111< 0.1%
 
733331< 0.1%
 
434101< 0.1%
 

www_likes
Real number (ℝ≥0)

ZEROS

Distinct1726
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.96242538
Minimum0
Maximum14865
Zeros60999
Zeros (%)61.6%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q37
95-th percentile208
Maximum14865
Range14865
Interquartile range (IQR)7

Descriptive statistics

Standard deviation285.5601519
Coefficient of variation (CV)5.715498191
Kurtosis449.1484832
Mean49.96242538
Median Absolute Deviation (MAD)0
Skewness16.91102529
Sum4946430
Variance81544.60033
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
06099961.6%
 
146974.7%
 
227602.8%
 
319482.0%
 
414191.4%
 
512021.2%
 
610811.1%
 
78970.9%
 
87920.8%
 
97570.8%
 
Other values (1716)2245122.7%
 
ValueCountFrequency (%) 
06099961.6%
 
146974.7%
 
227602.8%
 
319482.0%
 
414191.4%
 
ValueCountFrequency (%) 
148651< 0.1%
 
129031< 0.1%
 
110771< 0.1%
 
107631< 0.1%
 
106271< 0.1%
 

www_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1636
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.56883125
Minimum0
Maximum129953
Zeros36864
Zeros (%)37.2%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q320
95-th percentile227
Maximum129953
Range129953
Interquartile range (IQR)20

Descriptive statistics

Standard deviation601.416348
Coefficient of variation (CV)10.26853934
Kurtosis23812.2491
Mean58.56883125
Median Absolute Deviation (MAD)2
Skewness126.257317
Sum5798490
Variance361701.6237
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03686437.2%
 
185138.6%
 
251115.2%
 
335863.6%
 
428282.9%
 
523172.3%
 
619181.9%
 
716021.6%
 
814451.5%
 
913731.4%
 
Other values (1626)3344633.8%
 
ValueCountFrequency (%) 
03686437.2%
 
185138.6%
 
251115.2%
 
335863.6%
 
428282.9%
 
ValueCountFrequency (%) 
1299531< 0.1%
 
621031< 0.1%
 
396051< 0.1%
 
392131< 0.1%
 
340391< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

useridagedob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
020943821419199911male266.000000000
11192601142199911female6.000000000
220838841416199911male13.000000000
312031681425199912female93.000000000
41733186144199912male82.000000000
51524765141199912male15.000000000
61136133131420001male12.000000000
7168036113420001female0.000000000
8136517413120001male81.000000000
9171256713220002male171.000000000

Last rows

useridagedob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
989931654565191519948male394.04538414445011508844355961669127
98994206300620419931female402.01988332735110602572487333310332692
989951132164209199310female699.03611973450777684414690993859
989961668695242519894female182.0293812726018177655843117081756057
9899714589852814198512female290.022181618462610268429042503366018
98998126829968419454female541.021183413996180893505118874916202
989991256153181219953female21.01968172044011341243991059222820
990001195943151019985female111.0200215241195912554119591146201092
990011468023231119904female416.0256018545066516450657600756
990021397896391519745female397.020497689410124439410953002913